Distance-based Spatial Weights

Geospatial Analytics

Part 2 - Journey to find the “neighbours”

Jasper Lok https://jasperlok.netlify.app/
02-13-2023

Photo by Nadine Shaabana on Unsplash

Previously we discussed contiguity-based spatial weight.

In this post, we will focus on another spatial weighting method, which is distance-based spatial weights.

What is distance-based spatial weight?

(Rey, Arribas-Bel, and Wolf 2020) explained that distance-based weight can be defined as neighbour relations as a function of the distance separating spatial observations.

In other words, this approach determines who are the neighbours by using the distance, instead of which areas are connected.

Types of distance-based spatial weights

Below are two common distance-based spatial weights approaches:

K nearest neighbours weights

(Rey, Arribas-Bel, and Wolf 2020) defines this type of distance-based weight as the neighbour set of a particular observation containing its nearest k observations, where the user specifies the value of k.

Fixed distance

As the name suggested, this approach defines to be the neighbour of the selected district, the distance between them must be within the specified range.

We will see more about how this fixed distance spatial weight works in the demonstration.

Best practice on how to select the appropriate spatial weighting method

This was discussed in my previous post.

Please refer to the previous post for more info.

Demonstration

I will download Malaysia shape files from this link.

For more explanations on shape files, please refer to my previous post.

Setup the environment

First, I will setup the environment by calling the necessary packages.

pacman::p_load(tidyverse, sf, spdep, tmap, janitor)

I will also set the tmap_mode to view so that I can interact with the graphs.

tmap_mode('view')

Import the data

Import shp files

Next, I will import the dataset into the environment.

msia_map <- st_read(dsn = "data", layer = "MYS_adm2")
Reading layer `MYS_adm2' from data source 
  `C:\Users\jaspe\OneDrive\Documents\2_Data Science\98_my-blog\_posts\2023-02-12-distance-spatial-weights\data' 
  using driver `ESRI Shapefile'
Simple feature collection with 144 features and 14 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 99.64072 ymin: 0.855001 xmax: 119.2697 ymax: 7.380556
Geodetic CRS:  WGS 84

Next, I will visualize the Malaysia map.

tm_shape(msia_map) +
  tm_polygons()

Good! Now, we can proceed and find the “neighbours”.

Distance-based spatial weight

In this sub-section, I will derive the distance-based spatial weights.

To better visualize the spatial weights results later, I will find the centroids of different administrative districts.

longitude <- map_dbl(msia_map$geometry, ~st_centroid(.x)[[1]])
latitude <- map_dbl(msia_map$geometry, ~st_centroid(.x)[[2]])

Once the latitude and longitude are derived, I will use cbind function to bind the columns together.

coords <- cbind(longitude, latitude)

K Nearest Neighbours

I will pass the created coords object into knearneigh function.

kneigh <- knearneigh(coords, k = 4)

According to the documentation, the function returns a matrix with the indices of points belonging to the set of k nearest neighbours of each other.

I have sliced the first 10 rows from the data to have a look at who are their neighbours.

kneigh$nn[1:10, 1:4]
      [,1] [,2] [,3] [,4]
 [1,]    8    3    6    9
 [2,]    5    4    9    3
 [3,]    1    5    7    9
 [4,]    2    5    9    7
 [5,]    9    2    4    3
 [6,]   36    8   44   10
 [7,]    3    4    5   10
 [8,]    6    1   36   10
 [9,]    5    2    3    1
[10,]    6    8   54   44

Once we find the neighbours, we will use knn2nb function to convert the objects into nb so that we can use the plot function to visualize the results.

knn <- knn2nb(kneigh)

Fantastic!

I will then pass the objects to the plotting function to visualize the results.

plot(msia_map$geometry, border = "lightgrey")
plot(knn, coords, add = TRUE, col = "blue")

Noted that under this approach, everyone has the same number of neighbours.

Fixed distance

Now, let’s move on to another distance-based spatial weight approach, which is fixed distance.

To do so, I will use dnearneigh function.

I will need to specify the lower and upper distance bounds in the function.

Over here, I have specified the lower and upper distance bounds are 0 and 100 respectively.

fixed_d <- dnearneigh(coords, 0, 100, longlat = TRUE)

spdep package offers a function to extract the number of neighbours of each area.

card(fixed_d)
  [1] 11  6 10  5  7 15  8 14  7 15 18 18 15 19 11 19  6 11 18 15 18
 [22] 19 10 10 11 10 14 13 10 12 12  9 20  6 17 16 15 24 22 23 19 22
 [43] 22 18 20 18 11  8  7  8 11  6 16 11 16 12 10 16 18 17 12 15 15
 [64] 11  9  9 16 16 18 19 16 20 11  5 13  3 11 11  9 10  3  4  5  4
 [85] 12 11  5 10 12  3  3  8 12  4  9  5  9  8  6  0 11  2  9  9 14
[106] 11  0  6  6  3  7  4  1 11 13  1  6 12  8 12 11  9  5 12  8  6
[127]  8  3 19 21 18 16 16 16 18 13 20 14  7 11  6  8  6 12

Oh no! Some of the districts (i.e. the two colored districts in the graph below) don’t have any neighbours as they seem to be far from everyone.

So if we were to visualize the result by using our usual plotting function, we will see that there are two nodes within the graph that don’t have any edges.

plot(msia_map$geometry, border = "lightgrey")
plot(fixed_d, coords, add = TRUE, col = "blue")

To overcome this issue, we could increase the upper distance bound to a higher number.

As such, we will find the largest distance between the first nearest neighbours and

In this example, I will take the max distance from K nearest neighbours as the upper distance bound.

Then I will use nbdists function to calculate the Euclidean distances along the links.

dist_new <- nbdists(knn, coords, longlat = TRUE)

Then I will unlist the object to convert it into a list before passing it to summary function to find the summary statistics.

dist_new_unlist <- unlist(dist_new)
summary(dist_new_unlist)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  8.261  29.384  40.714  45.795  56.039 163.055 

Great!

The largest distance between the first nearest neighbours is 163.055km from the summary statistics shown above.

Taken from giphy

Now, we can find the neighbours by using the updated upper distance bound.

fixed_d_new <- dnearneigh(coords, 0, 163.1, longlat = TRUE)

plot(msia_map$geometry, border = "lightgrey")
plot(fixed_d_new, coords, add = TRUE, col = "blue")

Now all the districts have “neighbours”.

We can also call the created object to see the summary result.

fixed_d_new
Neighbour list object:
Number of regions: 144 
Number of nonzero links: 3120 
Percentage nonzero weights: 15.0463 
Average number of links: 21.66667 

From the result, we noted the following:

Conclusion

That’s all for the day!

Thanks for reading the post until the end.

Feel free to contact me through email or LinkedIn if you have any suggestions on future topics to share.

Refer to this link for the blog disclaimer.

Till next time, happy learning!

Photo by Dariusz Sankowski on Unsplash

Rey, Sergio J., Dani Arribas-Bel, and Levi J. Wolf. 2020. Spatial Weights. https://geographicdata.science/book/notebooks/04_spatial_weights.html#introduction.

References